22 research outputs found
On the Role of Social Identity and Cohesion in Characterizing Online Social Communities
Two prevailing theories for explaining social group or community structure
are cohesion and identity. The social cohesion approach posits that social
groups arise out of an aggregation of individuals that have mutual
interpersonal attraction as they share common characteristics. These
characteristics can range from common interests to kinship ties and from social
values to ethnic backgrounds. In contrast, the social identity approach posits
that an individual is likely to join a group based on an intrinsic
self-evaluation at a cognitive or perceptual level. In other words group
members typically share an awareness of a common category membership.
In this work we seek to understand the role of these two contrasting theories
in explaining the behavior and stability of social communities in Twitter. A
specific focal point of our work is to understand the role of these theories in
disparate contexts ranging from disaster response to socio-political activism.
We extract social identity and social cohesion features-of-interest for large
scale datasets of five real-world events and examine the effectiveness of such
features in capturing behavioral characteristics and the stability of groups.
We also propose a novel measure of social group sustainability based on the
divergence in group discussion. Our main findings are: 1) Sharing of social
identities (especially physical location) among group members has a positive
impact on group sustainability, 2) Structural cohesion (represented by high
group density and low average shortest path length) is a strong indicator of
group sustainability, and 3) Event characteristics play a role in shaping group
sustainability, as social groups in transient events behave differently from
groups in events that last longer
MusiXplora: Visualizing Geospatial Data in the Musicological Domain
The musiXplora is an interactive and multimodal tool for the domain of musicology, developed in a
collaborative and interdisciplinary fashion. It serves as a research environment that, on the one hand, links large data
collections on musicians, musical instruments, events and more,
and, on the other hand, offers a set of visualizations which allow
users to explore and analyze these data sets comprehensively.
In this paper, we discuss our recent work to emphasize the
relevance of geovisualizations in the musicological domain and
provide detailed insights into how the musiXplora can be
used to address geospatial research questions. We introduce
two distinct use cases and discuss how musicologists can use
the musiXplora’s geovisualizations as distant-reading tools.
Thereby we demonstrate how the musiXplora can contribute to
the confirmation of existing hypotheses and to the formulation
of new ones
Efficient Community Detection in Large Networks using Content and Links
In this paper we discuss a very simple approach of combining content and link information in graph structures for the purpose of community discovery, a fundamental task in network analysis. Our approach hinges on the basic intuition that many networks contain noise in the link structure and that content information can help strengthen the community signal. This enables ones to eliminate the impact of noise (false positives and false negatives), which is particularly prevalent in online social networks and Web-scale information networks. Specifically we introduce a measure of signal strength between two nodes in the network by fusing their link strength with content similarity. Link strength is estimated based on whether the link is likely (with high probability) to reside within a community. Content similarity is estimated through cosine similarity or Jaccard coefficient. We discuss a simple mechanism for fusing content and link similarity. We then present a biased edge sampling procedure which retains edges that are locally relevant for each graph node. The resulting backbone graph can be clustered using standard community discovery algorithms such as Metis and Markov clustering. Through extensive experiments on multiple real-world datasets (Flickr, Wikipedia and CiteSeer) with varying sizes and characteristics, we demonstrate the effectiveness and efficiency of our methods over state-of-the-art learning and mining approaches several of which also attempt to combine link and content analysis for the purposes of community discovery. Specifically we always find a qualitative benefit when combining content with link analysis. Additionally our biased graph sampling approach realizes a quantitative benefit in that it is typically several orders of magnitude faster than competing approaches
Prediction of Topic Volume on Twitter
We discuss an approach for predicting microscopic (individual) and macroscopic (collective) user behavioral patterns with respect to specific trending topics on Twitter. Going beyond previous efforts that have analyzed driving factors in whether and when a user will publish topic-relevant tweets, here we seek to predict the strength of content generation which allows more accurate understanding of Twitter users\u27 behavior and more effective utilization of the online social network for diffusing information. Unlike traditional approaches, we consider multiple dimensions into one regression-based prediction framework covering network structure, user interaction, content characteristics and past activity. Experimental results on three large Twitter datasets demonstrate the efficacy of our proposed method. We find in particular that combining features from multiple aspects (especially past activity information and network features) yields the best performance. Furthermore, we observe that leveraging more past information leads to better prediction performance, although the marginal benefit is diminishing
Efficient Skyline Computation in Metric Space
Given a set of n query points in a general metric space, a metric space skyline (MSS) query asks what are the closest points to all these query points in the database. Here, consider for any point p, if there are no other points in the database which have less or equal distance to all the query points, then p is denoted as one of the closest points to the query points. This problem is a direct generalization of the recently proposed spatial-skyline query problem, where all the points are located in two or three dimensional Euclidean space. It is also closely related with the nearest neighbor (NN) query, the range query and the common skyline query problem. In this paper, we have developed new algorithms to aggressively prune non-skyline points from the search space. We also contribute two new optimization techniques to reduce the number of distance computations and dominance tests. Our experimental evaluation has shown the effectiveness and efficiency of our approach
3-HOP: a high-compression indexing scheme for reachability query
Reachability queries on large directed graphs have attracted much attention recently. The existing work either uses spanning structures, such as chains or trees, to compress the complete transitive closure, or utilizes the 2-hop strategy to describe the reachability. Almost all of these approaches work well for very sparse graphs. However, the challenging problem is that as the ratio of the number of edges to the number of vertices increases, the size of the compressed transitive closure grows very large. In this paper, we propose a new 3-hop indexing scheme for directed graphs with higher density. The basic idea of 3-hop indexing is to use chain structures in combination with hops to minimize the number of structures that must be indexed. Technically, our goal is to find a 3-hop scheme over dense DAGs (directed acyclic graphs) with minimum index size. We develop an efficient algorithm to discover a transitive closure contour, which yields near optimal index size. Empirical studies show that our 3-hop scheme has much smaller index size than state-of-the-art reachability query schemes such as 2-hop and path-tree when DAGs are not very sparse, while our query time is close to path-tree, which is considered to be one of the best reachability query schemes
MusiXplora: Visualizing Geospatial Data in the Musicological Domain
The musiXplora is an interactive and multimodal tool for the domain of musicology, developed in a
collaborative and interdisciplinary fashion. It serves as a research environment that, on the one hand, links large data
collections on musicians, musical instruments, events and more,
and, on the other hand, offers a set of visualizations which allow
users to explore and analyze these data sets comprehensively.
In this paper, we discuss our recent work to emphasize the
relevance of geovisualizations in the musicological domain and
provide detailed insights into how the musiXplora can be
used to address geospatial research questions. We introduce
two distinct use cases and discuss how musicologists can use
the musiXplora’s geovisualizations as distant-reading tools.
Thereby we demonstrate how the musiXplora can contribute to
the confirmation of existing hypotheses and to the formulation
of new ones
MusiXplora: Visualizing Geospatial Data in the Musicological Domain
The musiXplora is an interactive and multimodal tool for the domain of musicology, developed in a
collaborative and interdisciplinary fashion. It serves as a research environment that, on the one hand, links large data
collections on musicians, musical instruments, events and more,
and, on the other hand, offers a set of visualizations which allow
users to explore and analyze these data sets comprehensively.
In this paper, we discuss our recent work to emphasize the
relevance of geovisualizations in the musicological domain and
provide detailed insights into how the musiXplora can be
used to address geospatial research questions. We introduce
two distinct use cases and discuss how musicologists can use
the musiXplora’s geovisualizations as distant-reading tools.
Thereby we demonstrate how the musiXplora can contribute to
the confirmation of existing hypotheses and to the formulation
of new ones